Statistical patterns of visual search for hidden objects 
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The movement of the eyes has been the subject of intensive research as a way to elucidate inner 
mechanisms of cognitive processes. A cognitive task that is rather frequent in our daily life is the 
visual search for hidden objects. Here we investigate through eye-tracking experiments the statistical 
properties associated with the search of target images embedded in a landscape of distractors. 
Specifically, our results show that the twofold process of eye movement, composed of sequences of 
fixations (small steps) intercalated by saccades (longer jumps), displays characteristic statistical 
signatures. While the saccadic jumps follow a log normal distribution of distances, which is typical 
of multiplicative processes, the lengths of the smaller steps in the fixation trajectories are consistent 
with a power-law distribution. Moreover, the present analysis reveals a clear transition between a 
directional serial search to an isotropic random movement as the difficulty level of the searching 
task is increased. 

PACS numbers: 87.19.lt, 89.75.Da, 05.40. Fb 

It is a common misconception to believe that memories are stored in the brain the same way a movie is stored in 
a hard drive. Remembering, just Uke seeing and listening, is in fact an act of construction much more complex than 
usually thought, where vasts amounts of information are processed and interpreted by the brain in order to create 
what we call memories, and pretty much everything else we call reality [1]. The field dedicated to the study of these 
types of processes is called Cognitive Science, which took its current form in the first half of the 20th century out 
of a mishmash of sciences, including, among others. Psychology, Linguistics and Computer Science. More precisely, 
the main challenge of the Cognitive Science is to answer questions related to the way in which the brain processes 
available information and how this shapes behaviour. [5]. 

As theoretical entities, cognitive processes cannot be directly observed and measured [3]. Thus, in order to be able 
to study them, we need to rely on observations about the behaviour of individuals. A very often utilized approach 
is to follow the eye movement during cognitive tasks. By the end of the XIX century, it was still thought that the 
eyes smoothly scanned the line of text during reading. Louis Emile Javal in his unprecedented study of 1879, 
observed that the eyes actually move in a succession of steps, called fixations, followed by jerk-like movements, called 
saccades, that are too fast to capture new visual information [5 . The method of eye-tracking as a fundamental source 
of information about cognition was finally introduced through the seminal work of Yarbus ^ . This study provided 
unambiguous demonstration for the fact that the movement of the eyes is strongly correlated with the cognitive 
objectives of the individual. 

A cognitive process that benefits the most from the study of eye movement is the visual search for hidden objects 
[7], like when trying to find a person in a crowded place, or a 2 inches nail inside a box of nails of various sizes. An 
early theory related to this process is due to Treismann and Gelade, called Feature Integration Theory (FIT) [8]. 
This theory deals with attention, a kind of mental focus that can be directed to a desired region of the visual scene, 
therefore enhancing the perceptual sensitivity in that region. The FIT proposes that visual search tasks are divided 
into two stages. The first is a detection stage, in which a small set of simple separable features like color, size and 
orientation are identified in the elements inside the optical array. This stage is a preattentive one, that is, attention 
need not be directed at each element of the image in order to perform detection, all feature registration takes place in 
parallel across the whole visual scene. In the second stage, called integration, the features identified in the previous 
stage are combined in order to conceive more complex characteristics. This is an attentive stage, thus it is much 
slower, requiring the observer to scan each element of the image serially. 

It is interesting to note that the FIT resembles a broader category of paradigms, namely the dual process models 
[5]. Under this conceptual framework, complex cognitive tasks usually consist of two systems, that essentially differ 
in which Kahneman [10] referred as (1) effortless intuition, and (2) deliberate reasoning. The system (1) comprises 
processes that are fast, intuitive and can be performed automatically and in parallel, like when trying to identify 
the state of spirit of a person based on her/his facial expression. These processes are acquired through habit, being 
usually inflexible and hard do control or modify. The system (2), on the other hand, is characterized by slow, serial 
but extremely controlled processes, which is the case, for example, when one tries to solve a mathematical equation. 

The FIT was thoroughly studied and expanded during subsequent years [TT]. Although regarded, in its initial 
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form, as an oversimplification [T^], it surely represents a formidable conceptual starting point point for research on 
the subject. Two particular assumptions of the FIT, namely that the whole optical array is homogeneously analyzed 
and that attention can be displaced independently of the eye movements (thus being called covert attention), are of 
interest to be expanded upon |13) . It is widely known that visual acuity falls rapidly from the point of fixation |14j . 
being confined to a small region called fovea. While there is little doubt about the existence of covert attention [TS] , it 
has been argued that situations in which covert attention performs better than overt eye movements are unusual and 
restricted to laboratory tests |13j . These led to further investigation about the function of eye movements in visual 
search [TMH|- 

Eye movements are composed of fixations and saccades, but even during fixations, the eyes are not completely 
still. In fact, fixational eye movements (FEyeM) include drift, tremor and microsacades. The drift corresponds to 
the erratic and low velocity component of FEyeM. The tremor is irregular and noise-like with very high frequency, 
while microsacades correspond to small rapid shifts in eye position akin to saccades, but preferentially taking place 
on horizontal and vertical directions [19| . Whether or not each one of these movements play an effective role in visual 
cognition still represents a rather controversial issue [20H22] . but it is known widespread that, if the FEyeM halt, 
visual perception stops completely. Previous attempts to model eye movements have been mainly devoted to describe 
the sequence of fixations and saccades in terms of stochastic processes [33] like regular random walks . Very often, 
the gaze is considered as a random walker subjected to a potential extracted from a saliency map, namely a field that 
depends on the particular features of the image under inspection, such as color, intensity and orientation [25H28] . 

Recent research on visual cognition has been directed to the development of experimental and analytical methods 
that can potentially elucidate the interplay between different components of cognitive activities, and how their in- 
teractions give rise to cognitive performance [55]. While the detection and integration processes mentioned before 
represent basic components of visual cognition that can be investigated separately, the way they interact should be 
relevant for the comprehension of more intricate visual tasks. Therefore, it is of paramount interest to determine if 
cognitive dynamics is dominated by components or interactions. Here we show through eye-tracking experiments that 
the cognitive task of visual search for hidden objects displays typical statistical signatures of interaction-dominated 
processes. Interestingly, by increasing the difficulty level of the visual task, our results also indicate that the eye 
movement changes from a serial reading-like (systematic) to an isotropic (random) searching strategy. 

RESULTS 

Visual search experiments have been performed with targets hidden in two different types of disordered substrate 
images (see Methods for details). In the first, as depicted in Fig. [l] the subjects were asked to search for a target 
(number 5) in an image with distractors (numbers 2) placed on a regular array. Figure [2] shows an example of the 
second type of test, where we utilized images from the book series "Where's Wally?" [30]. These last can be considered 
as very complex images, since distractors are irregularly placed in an off-lattice configuration and specially drawn to 
closely resemble the target. The resulting image designed under these conditions frequently leads to a searching task 
of enhanced difficulty. The analysis of the results from the two tests enabled us to identify general statistical patterns 
as well as particular features in the eye movement that are related with the irregularity and complexity of the image 
adopted in the eye-tracking experiments. 

In the case of the 5-2 lattice tests, the typical trajectories shown in Fig. [T] indicate that, when the number of 
distractors is small, most subjects performed systematic searches, that is, the task is accomplished in a manner 
that resembles a person reading a text, for example, from left to right and/or from top to bottom. By increasing 
the number of distractors, a transition can be observed from this directional (systematic) trajectory to an isotropic 
random strategy of searching for the large majority of the experiments. Precisely, systematic patterns have been 
observed in two thirds of the eye-tracking recordings (42 out of 63) for difficulty (see Fig. [T^), half of the recordings 
(16 out of 32) for difficulty 1 (see Fig. [Tja), and only one fourth of the recordings (6 out of 24) for difficulty 2 (see 
Fig. [I|:). No discernible systematic searches were observed in the case of "Where's Wally?" tests. 

Next, we analyze the size distributions of gaze jumps calculated for the raw data obtained from eye-tracking 
experiments. By definition, the size of a jump in this case corresponds to the distance, measured in number of 
pixels, covered by the eye gaze during each recording step of the eye-tracker device, adjusted here for approximately 
17 milliseconds. Strikingly, as depicted in Figs. [3| all tests produced alike distributions of gaze jumps, regardless 
of the subjects, complexity of the tests, or the search strategy (regular or random). This universal shape reflects 
the fixation-saccade duality of the eye movement and clearly points to a superposition of behaviours instead of a 
description in terms of pure monomodal distributions |31l I32j . 

The presence of two modes separated by a slight depression that marks the overlap region can be observed in 



FIG. 1: Search over 5-2 lattices. The subjects try to find a single number 5 in an array of red and green numbers 2 
(distractors). Of course, the larger the size of the array (number of distractors), the more difficult is the searching task. Two 
distinct types of searching patterns are clearly observed. The systematic search shows an anisotropic characteristic, as in (a) 
and (b), with the eye moving more frequently in a particular direction, horizontally most often, but also vertically for some 
subjects. In the case of random search, as in (c), the eyes are likely to move equally in any direction. Our results also show that 
the frequency with which the subjects follow the systematic pattern decreases with the difficulty of the test. Two thirds of the 
recordings (42 out of 63), in the case of difficulty 0, correspond to systematic searches, while only half of the recordings (16 out 
of 32) displayed this behaviour with tests of difficulty 1. In the presence of a large number of distractors, most of the subjects 
prefer to follow a random search strategy. This is the case with tests of difficulty 2, where only one fourth of the recordings (6 
out of 24) show systematic behaviour. 

practically all jump size distributions of the raw data. Such a behaviour strongly suggests the need for a filtering 
process through which fixations and saccades can be adequately identified and their statistical properties independently 
studied. With this purpose, here we apply a modified version of the fixation filter developed by Olsson |33) . as described 
in the Methods section. As shown in Figs. |4] and [5] the resulting distributions of jump sizes for fixational movements 
obtained for 5-2 and "Where's Wally?" tests, respectively, also display the same statistical signature. Precisely, for 
gaze steps larger than 10 px, the distances Ar follow typical power-law distributions, 

P{Ar) cx Ar"", (1) 

with a statistically identical exponent, a « 2.9, for all tests (see Table |l]). For gaze steps smaller than lOpx, the 
distributions display approximately uniform behaviour, possibly due to the fact that, in this scale, eye tremor is of 
the order of drift, although this hypothesis cannot be tested with the time resolution used in our measurements (34j . 
Once identified through the filtering process, the analysis of the saccadic movements in all tests reveals that the 
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distributions of sizes for this type of eye jump can be well described in terms of a log-normal distribution, 



P(Ar) 



exp 



(log Ar - pt) 
2(t2 



(2) 



where the parameters fi and a correspond to the average and variance of the logarithm of the saccade length, respec- 
tively. Once more, the fact that a single distribution function can properly describe the general statistical features of 
different searching tests suggests that same underling mechanisms control the cognitive task under investigation. It 
is interesting to note, however, that the numerical values of the estimated parameters of the distributions depend on 
the details of the test. For instance, in the case of 5-2 tests, the mode of the distribution (the most probable length) 
decreases systematically with the difficulty of the searching task, indicating that saccadic movements somehow adapt 
to the complexity of the image. 



Fixations 



Saccades 



a 




a' 


Difficulty 2.951 ± 0.054 
Difficulty 1 2.825 ± 0.020 
Difficulty 2 2.938 ± 0.019 
Wally 3.091 ± 0.017 


4.810 ±0.050 
4.594 ±0.018 
4.510 ±0.019 
4.444 ±0.012 


0.675 ±0.415 
0.727 ±0.168 
0.830 ±0.173 
0.698 ± 0.112 



TABLE I: Parameters of the jump size distributions obtained from the non-hnear least squares fitting to the 
filtered data. The fixational steps follow a power-law, P(Ar) oc Ar~" , for Ar > 10 px, while the saccadic jumps display a 
log-normal type of behaviour, P(Ar) = exp [-(log Ar - fi)^/2a^]/ArV2 na^. The error represent a bootstrap estimation of the 
95% confidence interval HSl. 




FIG. 2: Where's Wally? On the left, we show the result of a typical eye-tracker recording of a search task on a complex 
landscape [30;. The basic elements of eye movement are clearly present, namely the numerous sets of fixation points connected 
by large jumps (saccades) [5]. In this particular searching test, the points of fixation bunch up around some regions, where 
certain details of the image demand more attention than others, however, one can not perceive any systematic pattern in the 
trajectory of the eyes. On the right, part of the image is enlarged, where the red circle indicates the location of the target. The 
recording process ends when the target is found. 
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FIG. 3: Log-log plots of the distributions of gaze jump sizes. The size of a jump corresponds to the distance, measured 
in number of pixels, covered by the eye gaze in a interval of 17 milliseconds. The logarithmic plots on the top are the 
distributions of jump sizes obtained for different subjects and each of the tests performed, namely "Where's Wally?" and 5-2 
lattices with difficulties 0, 1 and 2. Despite small variations, all distributions show a similar quantitative behaviour. This 
"universal" statistical signature of the searching process can also be detected from the results displayed on the bottom panel, 
where the distributions of gaze jump sizes averaged over all subjects are shown. The similar shape observed in all distributions 
suggests that identical mechanisms control the amplitude of the gaze shift, regardless of the systematic or random aspects of 
the searching movements on the 5-2 lattices, and the distinctive features of the underlying arrangement of distractors that 
compose the "Where's Wally?" landscapes. The shaded area delimits a depression region that appears systematically in all 
distributions, where the sizes of fixational and saccadic eye movements overlap. 



DISCUSSION 



In summary, our results from eye-tracking tests in which subjects are asked to find a specific target hidden among 
a set of distractors (see Methods) reveal a gradual change on the searching strategy, from a directional reading-like 
(systematic) to an isotropic (random) movement as the number of distractors increases. However, regardless of the 
differences in image complexity, searching tasks and individual skills of the subjects, we observe universal statistical 
features related with the distributions of gaze jump sizes. These distributions generally show a characteristic bimodal 
behaviour, consequence of the intrinsic dual nature of eye movement |32| . that alternates between saccades and 
fixations. 

The application of a fixation filter to the raw data enables us to study separately the distributions of jump sizes for 
fixational and saccadic gaze steps. We find that the distribution of fixational movements show long tails which obey 
power-laws [3S], while saccades, on the other hand, follow a log-normal type of behaviour. The fact that both log- 
normal and power-law distributions arise from multiplicative processes |36j provide strong support to the hypothesis 
that the interactions between components dominates the cognition task of visual search [3T] . In a dynamics governed 
by interactions, the organization of the components and the way they process information are context dependent, 
with no particular function being encapsulated in any of the components themselves. This non-linear response to the 
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FIG. 4: Jump sizes distributions of filtered data for the 5-2 searching tasks. For each difficulty level, the corresponding 
data has been averaged over all subjects. By applying the filter developed by Olsson [331 data, we can distinguish 

between fixational and saccadic movements, so that the jump size distributions for each mechanism of eye movement can be 
studied separately. For better visualization, the resulting distribution curves are shifted vertically by a factor of 1/16 and 16, 
in the case of difficulty (black) and difficulty 2 (orange), respectively. As depicted, the tails of the fixation curves can be 
adequately fitted by power-laws, P{Ar) oc Ar~" (dashed lines), with an exponent, a ~ 2.9, for all tests (see Table The 
distributions of jump sizes for the saccadic movements follow a quite different behaviour, which is compatible with a log-normal 
distribution, as the best fits to the three data sets show (dashed lines) on the left panel. The corresponding fitting parameters 
are presented in Table [T] Interestingly, the most frequent length of the saccades (arrows) decreases with the difficulty of the 
test. This could possibly happen either because the distractors (numbers 2) are simply smaller in the more difficult tests or 
the saccades are influenced by the colors of the distractors (that form relatively smaller clusters in more difficult settings), or 
a conjunction of both effects. 

influx of information would give rise to multiplicative distributions like the ones we disclosed here. 

These observations are in evident contrast with a component based scenario, where the final performance of a given 
cognitive task results from the simple addition of sub-tasks that usually process information in a specialized manner. 
Instead of log-normal or power-law distributions, a process like this would give rise to Gaussian or other additive 
distributions (e.g., exponential or gamma distributions) |37) . It is worth noting that our results are conceptually 
consistent with previous studies describing complex behaviour in visual cognition l55HiD] . As a perspective for 
future work, it would be interesting to relate our findings with other potential approaches based on non-cognitive 
random strategies, where the searching task can be the result of an optimization process |41H44j . 

METHODS 
Equipment 

Eye movements were recorded with a Tobii T120 eye-tracking system (Tobii Technology). In this study we only 
consider data obtained after a valid calibration protocol is applied to both eyes of the subject. The stimuli were 
presented on a 17" TFT-LCD monitor with resolution 1024 x 1280 pixels and capture rate of 60 Hz. 

Tests 

Two types of tests consisting of visual searching for a hidden target randomly placed among a set of distractors were 
performed by 11 healthy subjects with an average age of 23 years. The stimuli of the first test consists of a square 
lattice composed of a single target number 5 and several number ^'s serving as distractors. All numbers (target and 
distractors) are randomly colored red or green, hindering the visual detection of the target through the identification 
of patterns on the peripheral vision. This images were organized in three difficulty levels according to the number of 
distractors, labeled 0, 1 and 2 for 207, 587 and 1399 distractors, respectively. 

The stimuli of the second test are scanned images from the "Where's Wally?" series of books [30]. The complexity 
of these images, where a large number of distractors (background characters) are irregularly placed together with 



7 



10 I ' , 




Ar (px) 



10 I — — — — ' 




Ar (px) 



FIG. 5: Jump sizes distributions of filtered data for tlie "Where's Wally?" searching taslts. The top panels on 
the left and right show the results for fixational and saccadic gaze jumps, respectively, calculated for different subjects. On the 
bottom, we show the same distributions, but now averaged over all subjects. The dashed lines correspond to the best fits to the 
data sets of power-laws, for the fixational movements, and log-normal distributions, for the saccadic movements. The statistical 
features of both mechanisms of eye movement observed here are quite similar to the ones identified for the 5-2 searching tests 
(see Fig|4|. 



Wally, the hidden target character, explains the high difficulty involved in this visual searching task. Not all images 
used had an actual target, since we had no intention to track the time taken to find the target. Instead, our objective 
was to induce the subjects to perform the searching task as naturally as possible. 

In order to stimulate subjects to search efficiently, in all tests, they were told to have a limited time to find the 
target, but not informed exactly how much time would be available. In the case of the 5-2 lattice tests, 1, 1.5 and 
2 minutes were given to search the target for the difficulties 0, 1 and 2, respectively. For the "Where's Wally" tests, 
the subjects had 2 min. A summary of the parameters can be found in Table [TT] 



Difficulty 


#of 
Images 


# of Size of Time 
Distractors Distractors Available 





8 


16 X 13 


76 px 


1 min 


1 


4 


33 X 26 


38 px 


1 min 30 s 


2 


3 


40 X 35 


25px 


2 min 


Wally 


5 






2 min 



TABLE II: Parameters used in the 5-2 and "Where's Wally" search tasks. 



Fixation Filter 



We adopted a modified version of the fixation filter developed by Olsson [53] in order to identify which gaze points 
belong to fixations and which belong to saccades. The basic idea is to distinguish between segments of the signal that 
are moving slowly due to drift, thus identified as part of a fixational sequence, from those moving faster, constituting 
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the saccades. This is achieved here by taking the raw signal output, s^, namely the position of the gaze captured at 
each timestamp i, and calculating for each point the mean position of two sliding windows of size r, one retarded and 
the other advanced, 



1 

-^Si±k- (3) 



r 
fe=i 



The distance between them is calculated as, 



(4) 



Since each timestamp has the same duration, the displacement given by Eq. |4] may be analyzed in the same way as 
the average velocity, thus if di is larger than its two neighbors (di^i and di+i), and is also larger than a given velocity 
threshold, it is considered a peak. If two peaks are found within the interval of a single window, only the largest one 
is considered. 

At this stage, the gaze points are divided into clusters separated by the peaks. In the original filter [33] . the median 
position of each cluster is used to locate the corresponding fixation. Since we are instead interested in separating the 
gaze points that correspond to fixations from those that belong to saccades, the radius of gyration for each cluster C 
is then calculated as, 



Rg 



where s is the mean position of the gaze points that belong to C. Steps that fall inside the circle area covered by the 
radius of gyration, and are centered at s, are considered to be fixational. The same applies to those steps that leave 
this area but return to it without passing through another fixation cluster. All other steps are considered saccadic 
jumps. 
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